On the Surprising Behavior of Distance Metrics in High Dimensional Space

نویسندگان

  • Charu C. Aggarwal
  • Alexander Hinneburg
  • Daniel A. Keim
چکیده

In recent years, the effect of the curse of high dimensionality has been studied in great detail on several problems such as clustering, nearest neighbor search, and indexing. In high dimensional space the data becomes sparse, and traditional indexing and algorithmic techniques fail from a efficiency and/or effectiveness perspective. Recent research results show that in high dimensional space, the concept of proximity, distance or nearest neighbor may not even be qualitatively meaningful. In this paper, we view the dimensionality curse from the point of view of the distance metrics which are used to measure the similarity between objects. We specifically examine the behavior of the commonly used Lk norm and show that the problem of meaningfulness in high dimensionality is sensitive to the value of k. For example, this means that the Manhattan distance metric (L1 norm) is consistently more preferable than the Euclidean distance metric (L2 norm) for high dimensional data mining applications. Using the intuition derived from our analysis, we introduce and examine a natural extension of the Lk norm to fractional distance metrics. We show that the fractional distance metric provides more meaningful results both from the theoretical and empirical perspective. The results show that fractional distance metrics can significantly improve the effectiveness of standard clustering algorithms such as the k-means algorithm.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

On the Surprising Behavior of Distance Metrics in High Dimensional Spaces

In recent years, the eeect of the curse of high dimensionality has been studied in great detail on several problems such as clustering, nearest neighbor search, and indexing. In high dimensional space the data becomes sparse, and traditional indexing and algorithmic techniques fail from a eeciency and/or eeectiveness perspective. Recent research results show that in high dimensional space, the ...

متن کامل

Developing 3 dimensional model for estimation of acoustic power in urban pathways in geo-spatial information system framework

Around the word, traffic growth is causing growing air and noise pollution. Noise levels in a given area are affected by traffic on the streets as well as effective factors, including existing infrastructure and industrial centers, and so on. The purpose of this research is to model and estimate the amount of acoustic emission in the streets of Tehran's third district, using the 3D spatial info...

متن کامل

A topology control algorithm for autonomous underwater robots in three-dimensional space using PSO

Recently, data collection from seabed by means of underwater wireless sensor networks (UWSN) has attracted considerable attention. Autonomous underwater vehicles (AUVs) are increasingly used as UWSNs in underwater missions. Events and environmental parameters in underwater regions have a stochastic nature. The target area must be covered by sensors to observe and report events. A ‘topology cont...

متن کامل

SOME POINTS ON CASIMIR FORCES

Casimir forces of massive ferrnionic Dirac fields are calculated for parallel plates geometry in spatial space with dimension d and imposing bag model boundary conditions. It is shown that in the range of ma>>l where m is mass of fields quanta and a is the separation distance of the plates, it is equal to massive bosonic fields Casimir force for each degree of freedom. We argue this equalit...

متن کامل

Einstein structures on four-dimensional nutral Lie groups

When Einstein was thinking about the theory of general relativity based on the elimination of especial relativity constraints (especially the geometric relationship of space and time), he understood the first limitation of especial relativity is ignoring changes over time. Because in especial relativity, only the curvature of the space was considered. Therefore, tensor calculations should be to...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2000